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Abstract 



In fail 1992 the National Association of Test Directors (NATD) surveyed its members on 
their involvement in the development of performance assessments and scoring rubrics. 
The purpose of this paper is to examine the extent to which members have developed 
performance assessments, how they went about doing so, the advice they would offer 
others who are developing performance assessments, and the nature of the scoring 
rubrics they developed. About half the respondents had devehped performance 
assessments, mostly in writing. They recommend extensive teacher involvement and 
adequate time as essential to the development process. A table summarizing the 
atttributes of the scoring rubrics they submitted is provided. 



Introduction 



In fall 1992 the National Association of Test Directors (NATD) surveyed its members on their 
involvement in the development of performance assessments and scoring rubrics. Follow-up 
questionnaires were mailed and responses were received from 64 members, about a third of the 
total membership. The purpose of this paper is to examine the extent to which members have 
developed perfonnance assessments, how they went about doing so, the advice they would offer 
others who are developing performance assessments, and the nature of the scoring rubrics they 
developed. 

Who responded? How many have developed performance assessments? 

About three-quarters of the respondents (49 of the 64, or 76.6%) are employed by local 
educational agencies (LEAs)\ The remainder are divided among colleges and universities, 
educational sen/ice districts, state educational agendes (SEAs), consultants, and other types of 
organizations. Because of their diversity and small sample size of each of the non-LEA 
subgroups, most of the analyses will focus on either the entire group or on the LEA respondents. 
Of the LEA group, 21 of the 49 respondents represent school systems with an enrollment of more 
than 35,000^ The enrollments of the districts represented range from 2,800 to 618,000. 

[Insert Table 1 about here] 

Table 1 gives a breakdowns by organizational affiliation of the respondents who have and have 
not developed perfonnance assessments. Slightly fewer than half the respondents (43.8%) report 
that they have developed performance assessments. Ifs not clear how well that percentage 
generalizes to other NATD members, but the actual percentage may well be lower, if one 
assumes that people who have developed perfonnance assessments are more likely than others 
to fill out a questionnaire with the heading "NATD Performance Assessment Survey." 

Among the LEA respondents, 22 of the 49 (44.9%) have developed performance assessments. 
The rate differs little for large districts (42.9%) and smaller districts (46.4%). 

In what subjects and for what grades have performance assessments been developed? 

Writing is by far the area in which the greatest number of performance assessments are being 
developed (see Table 2). !n fact, of the 28 members who had developed performance 
assessments, 24 had developed writing assessments. Reading and mathematics run a distant 
second and third. There was curiously little development reported in some of the areas that have 



^This percentage is near the proportion of all NATD members who work for LEAs. Of the members 
listed in the 1991 NATD directory (the most recent available), 74.9% listed an LEA as their primary 
affiliation. 

^,000 is the enrollment needed to qualify for membership in the Council of the Great City Schools. 
LEAs with an enrollment of 35,000 or greater will be considered "large" in subsequent analyses. 



bean wideiy regarded as lending themselves well to performance assessment: science, sociai 
studies, the fine arts, listening, speaking and foreign languages. This doesn't necessarily mean 
that performance assessments are not conducted in those areas, but it might indicate that 
development of such assessments is taking place at the school or classroom level (NATO 
members employed by LEAs are generally part of their district's central administration) or that 
schools are using assessments from other sources (e.g., state assessments, assessments 
purchased from publishers). 

[Insert Table 2 about here] 

Figure 1 gives frequencies and the mean number of subjects in which assessments were 
developed by LEAs. Of the 22 I EA respondents who developed performance assessments, 9 
developed assessments in only one subject area and 8 developed assessments in two subjects. 
Just under one-fourth report developing assessments in three or more subjects. The average 
number of subjects was 0.94; considering only the respondents who had developed at least one 
performance assessment, the average number of subjects was 2.11. 

[Insert Figure 1 about here] 

The grade levels for which performance assessments are developed vary by subject area (see 
Figure 2). For the LEA respondents, most of the development in reading and all of the math is 
concentrated in grades K-3. Writing and speaking assessments are more evenly distributed 
across the grades. 

[Insert Figure 2 about here] 

Does school system size affect the number of performance assessments developed? 

Table 3 shows the number and percentage of large and smaller school systems that have 
developed performance assessments. Performance assessments in at least one subject were 
developed by 42.9% of the larger systems (N =21) and 46.4% of the smaller ones (N = 28). 
Although about the same proportion of large and smaller school systems developed math 
performance assessments, larger districts were more likely than smaller ones to have developed 
writing assessments and less likely to have developed reading assessments, though the 
differences were not statistically significant. 

[Insert Table 3 about here] 

Figure 1 , which gives information on the number of subjects in which the respondents developed 
performance assessments, presents data for large and smaller school districts. The mean 
numbers of subject areas are almost identica! or the two groups. 

For LEAs, are performance assessments developed primarily at the classroom, school, or 
district level? 

Twenty-one members affiliated with LEAs responded to this item. The classroom and district 
levels were cited with equal frequency (57.1%) as the primary locus for development of 
performance assessments. Development at the school level was mentioned only half as often 



(28.6%). The only difference between large arid small districts was that respondents from the 
smaller LEAs were more likely to say that the classroom was a primary locus of development. 
It is not Clear whether less development is going on in the classrooms of large school systems 
or whether the respondents are less likely to know about such development in a very large school 
system. 

Which LEA Office or department has responsibility for developing performance 
assessments? 

Of the 19 members who responded to tiiis question, five indicated that the responsibility rested 
with testing, evaluation, and research staff and three cited cumculum and instruction. Eleven 
members said that responsibility was shared by the two offices. 

What is the role of NATO members In the development of performance assessments? 

Figure 3 summarizes members' responses to the question, "What is your role with respect to 
perfomiance assessment (e.g.. are you the primary developer, technical consultant, trainer, etc.)?** 
The most frequently mentioned role is that of technical consultant (73.9% of LEA respondents and 
70.0% of all respondents). About a third of the respondents serve as a trainer, coordinator of 
performance assessment development or data gatherer/analyst/reporter. Other frequently 
mentioned roles were primary developer and developer. Here is a sampling of how members see 
their roles: 

All of the above^ plus collaborator, broker for workshops and training sessions, 
etc. [Educational Service District] 

Al! of the above. I am in charge of the development and also sen/e as a technical 
resource. (We have others, but they are outside consultants.) I also train 
teachers to use the assessment system. [University-based assessmerjt center that 
develops early childhood performance assessments] 

Supervise development of systemwide performance assessments; scan and report 
assessment results; serve as technical consultant to central and school staff; 
conduct inservice t aining on performance assessment and portfolio development 
for teachers and principals. [LEA, 4t 1,000 students] 

Oversee all administration of assessments in district; work with Curriculum 
Department to develop events/open-ended questions; be knowledgeable and 
oversee writing and math portfolios, alternative portfolios for Special Education and 
primary portfolios for K-3. [LEA, 90,000 students] 

Technical consultant; coordinator of district efforts; district representative and 
cumiudgeon to the state testing people. [LEA, 65,000 students] 



*A not unexpected response from a testing director. 
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I beat the drum. Budget constraints severely impact our ability at present to 
pursue performance assessments; however, there is great enthusiasm in our 
schools. [LEA, 44,076 students] 

Supervisor of area....[Another person, who has primary responsibility. Is assisted 
by graduate students.] I guess I am mostly a cheerleader. [LEA, 44,000 students] 

Conceptual leader, trainer and consultant. [LEA, 31,000 students] 

Technical consultant....! also conduct all scoring training workshops for teachers. 
We have mentors assigned to me that are becoming "assessment 
trainers/experts." I help candidates write assessment projects each year. [LEA, 
20,000 siudonts] 

I did the project with the assistance of teachers...[whom] I hired to write items. 
[LEA, 18,000 students] 

We are beginning to adapt/develop performance assessments as part of our 
comprehensive evaluations. [Non-profit organization engaged in program 
evaluation] 

How were assessments and rubrics developed? 

Twenty-three members (20 from LEAs, 2 from SEAs, 1 from a college) described the process by 
which their performance assessments and njbrics were developed. Except for one LEA staffer 
whose school system purchased performance assessments from a publisher, the development 
process showed remarkable uniformity. Tasks and rubrics were generally developed by teachers 
or curriculum staff (or, much less frequently, by measurement specialists with input from those 
groups). Teachers were an integral part of the development process in nearly all cases. About 
a third of the LEAs reported that they adapted existing rubrics obtained from their states or other 
outside agencies. Many members reported a painstaking, iterative process of consensus building, 
reviews, pilots, analyses, and revisions. Some of their responses are given below: 

Writing objectives were identified by English teachers. Prompts were written by 
committee of English teachers and field tested. Sample of field test papers was 
used by committee to draft preliminary scoring rubric. Prompts are refined for 
districtwide administration. A random sample of papers from a stratified random 
sample of schools is pulled and used to refine scoring rubric and prepare training 
packets for readers. [LEA with 616,000 students, writing assessment at grades 7- 
12] 

Literature-Based Writing Process Assessment developed by teachers over the past 
four years. Revised rubric in 1990 to use a variation of the NWEA 6-trait 5-point 
rubric for writing. [LEA with 32,000 students, writing assessment at grades 1-8] 

We recruited a large group (approximately 40 teachers and administrators) to 
design the assessments. After spending atx)ut half the school year in research 
and assessment training^ we decided that the majority of our work Initially would 



focus on criteria writing. Tasks are more easily found, borrowed, or purchased 
from other sources. Deciding what to judge and what to look for in student wori^ 
and behavior would be the most important first step. We decided that standards 
coukl only be set after the tasks and assessment are field-tested and valWated. 
at which point the question of "how good is good enough?" can be addressed. 
[LEA with 32,000 students, writing and critical thinking assessments at grades 9- 
12] 

One individual with content expertise for preliminary draft-reviewed and revised 
by all appropriate grade level instructors-final draft small committee. [LEA with 
12,500 students, reading assessments at grades 1-6, writing assessments at 
grades 1-6 and 12] 

Teachers developed them based on Texas essential elements and district 
objectives. They were piloted smd revisions made. Instruction and assessment 
were combined. [LEA with 18,000 students, listening and speaking assessments 
at grades 1-6] 

Committee of teachers at grades K-3: (1) researched into performance assessment 
and best practices cun-ently being used; (2) established outcomes for primary 
mathematics; (3) selected tasks that validate these outcomes; (4) designed 
rubrics for scoring; (5) field tested and revised based on teacher comments. [LEA 
with 8,413 students, math assessments at grades 1-2] 

How did NATO members Investigate the technical quality of performance assessments? 

Eighteen of those surveyed (62.1% of those developing performance assessments) indicated that 
they had investigated the reliability and/or validity of their assessments. Of the 14 respondents 
who gave specific descriptions of the studies they conducted, ten had measured interrater 
reliability. Validity studies were mentioned much less frequently. The ways of investigating 
validity included: 

• comparison of scores with final course grades and a study of whether students 
enrolled in higher level courses scored higher [college] 

• correlation with objective writing assessment [LEA] 

• disaggregation of results by gender, ethnicity and language classification [LEA] 

• multi-faceted analysis of ratings across task, rater and content using item response 
theory [SEA] 

• gathering evidence recommended by Unn, Baker and Dunbar (1991)^ regarding 
consequences, generalizability, fairness, cognitive complexity, meaningfulness. 



\inn, R., Baker. E.. & Dui"toar, S. (1991). Complex, performance-based assessrnent: 
Expectations and validation criteria. Educational Researcher, 20(8), 15-21. 
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content quality and coverage, and cost justification, [university-based assessment 
center constructing prekindergarten-grade 1 assessments] 

Is student performance tied to any significant consequences? 

Except at the high school level, members report that performance assessments are generally low- 
stakes tests, at least from the student's point of view. Of the 18 LEA respondents, three said that 
students must currently pass a performance assessment in order to graduate, two reported that 
a similar requirement will be implemented soon and four indicated that passing a performance 
assessment was needed for certification of competency at a high school grade other than 12. 
Four members reported that performance assessments were used to detemriine placement into 
courses or special programs (e.g., remedial, gifted), three indicated that results will t>e linked to 
school accreditation, and two said that performance assessments were used in making promotion 
decisions. 

What types of rubrics are used? 

Nineteen meml^ers submitted one or more scoring rubrics, not all of which were developed by the 
member or his/her organization. Not surprisingly, most of the rubrics are for writing assessments. 
Of the writing rubrics, 15 were analytical (i.e., separate scores are assigned for specific features 
of the writing), four were holistic (i.e., there was a score for overall performance) and six used 
both analytical and holistic ratings. Only one member submitted writing rubrics that were prompt 
specific. Most of the other rubrics submitted were for assessments administered in the early 
elementary grades. These include tasks in a variety of subject areas, progress reports, joumals 
and student self-assessments. The rubrics for writing, reading, mathematics, listening, speaking 
and science are described in Table 4. 

[insert Table 4 about here] 

Which procedures have proven successful for developing assessment tasks and rubrics? 

When asked what advice they would give to others who need to develop performance 
assessments, nearly all of the 20 respondents (all but two from LEAs) mentioned getting 
extensive teacher input and allowing enough time. Here are some of their comments: 

Consider test development to be formative and subject to much revision. For the 
rubric: it is important to reach a consensus (but recognize that there will always 
be outliers!), [college] 

Involve a broad base of teachers in the development of tasks and rubrics after 
"umbrella" district objectives and "standards" have been established. [LEA, 
618,000 students] 

Teacher input is critical. Allow enough time to pilot and revise as much as 
necessary. Keep rubrics simple-teachers seem to prefer fewer, more global 
ratings to a larger number of more detailed ones. A single perfomiance 
assessment isn't going to provide all the infomiation needed; use multiple 
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measures and consider combining performance assessments with more traditional 
measures. [LEA, 411.000 students] 



Provide initial training for raters until they score reliably each time/session they 
work. (Even the best get "rusty.") [LEA, 55,000 students] 

Don't re-invent the wheel; use a developed model and modify as necessary. Read 
the literature. Do involve the teachers as local readers; it definitely improves 
Instnjction. [LEA, 43,000 students] 

Staff development activities and incentives/grants for developing tasks/rubrics for 
classroom or school use. [LEA, 32,000 student] 

Need some form of staff development when involving teachers; requires 
administrative leadership. [LEA, 31,524 spjdenis] 

Good training and discussion about the relevant dimensions and the range of 
scores is absolutely necessary to good perfonnance assessment. Tasks seem to 
be relatively easier to create or adapt once the criteria are established (although 
the lield-testing may show more problems with task selection than we anticipate!). 
A second hurdle to overcome is the conviction that performance assessments can 
provide a sufficient amount of information by themselves to document student 
learning, without evidence from additional, more traditional measures. Time and 
again teachers want to make decisions about students based upon one writing 
prompt, for example, in spite of evidence that the information is very narrow. One 
thing that helped this was gathering empirical evidence and showing teachers that 
the amount of error was very large when minimal amounts of data are used. [LEA, 
24,000 students] 

Top down doesn't work. Collaborative models are best. Teachers are motivated 
to find new assessments to support areas on our new report card. They have to 
feel U)nfident when explaining scores/grades to parents. [LEA, 20,000 students] 

Each teacher needs ownership and buy-in. Get input from everyone at some 
stage in the process, even if it's just a final "read and comment" request. [LEA, 
18,000 students] 

[1] Find balance between assessing via individual Interviews and integrating with 
instruction. [2] Keep it teacher based; tnjst teachers. [3] Defining criteria is 
hardest and most rewarding task for teachers who participate. [4] Build 
consensus. [LEA, 14,000 students] 

Teachers, time and district commitment of considerable financial resources 
needed. [LEA, 13,400 students] 

Tests need lote of pilot time for review by people who actually administer them. 
[LEA, 12.500 students] 
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We have worked with groups of teachers who then sought input from their peers 
before finalizing. [LEA 5,500 students] 

Successful: Lots of involvement from teachers; repeated pilot studies and rounds 
of revisions (one or two isn't enough!). Unsuccessful: Trying to go too fast! This 
type of development takes at least twice as much time (and probably more) than 
development of traditional tests. Teacher training must be comprehensive and 
ongoing, [university-based assessment center constructing prekindergarten-grade 
1 assessments] 

Tasks must be clearly stated and must cover n wkJe range of abilities and skills; 
rubrics must be open enough to capture richness of student responses; ^scoring 
must represent dear rating scales. [SEA] 

Discussion 

It is clear that, while *here is great interest in perfomnance assessment, development activities (at 
least at a district-wide level) are not being carried out by most of the respondents and are hardly 
being carried out at all in subjects other than writing. The reasons for this are unclear; the cause 
may have to do with a lack of time or money; insufficient dissatisfaction among decision-makers 
with existing assessments; the feeling that performance assessments should not be imposed in 
schools in a "top-down" fashion; or the possibility that such development is being carried out at 
either the classroom or the state level. The survey only addressed the issue of whether 
performance assessments were being developed, not whether they were being used. In 
retrospect, it would have been better to ask whether performance assessment was being used, 
and, if not, then why not. It would also be interesting to know to what extent portfolios are being 
used (only a few respondents mentioned them) and the processes by which portfolios become 
valid and reliable measures. 



Table 1 
NATD Survey Respondents 



Oroanizatlonal affiliation 


N responding 


N who have 
developed 
performance 
assessments 


H who have not 
developed 
performance 
assessments 


College/university 


6 


2 


4 


Educatfonal service district (ESD) 


3 


0 


3 


Local educational agency (LEA) 


49 


22 


27 


State educational agency (SEA) 


2 


2 


0 


Publisher 


1 


1 


0 


Other 


3 


1 


2 


Total 


64 


28 (43.75%) 


36 (56.25%) 
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N that have 
developed 

writing 
performance 
assessments 


9 

(42.9%) 


9 

(32.1%) 


N that have 
developed 

math 
performance 
assessments 


(9.5%) 


6 

(21.4%) 


N that have 
developed 

reading 
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(46.4%) 


N that have 
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least one 
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Figure 3 

NATD Members Roles with Regard to Performance Assessment Development 
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All respondents (N = 30) 




LEA respondents (N = 23) 
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